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At a fundamental level, the classical picture of the world is dead, and has been dead now for almost a century. 
Pinning down exactly which quantum phenomena are responsible for this has proved to be a tricky and 
controversial question, but a lot of progress has been made in the past few decades. We now have a range of 
precise statements showing that whatever the ultimate laws of Nature are, they cannot be classical. In this 
article, we review results on the fundamental phenomena of quantum theory that cannot be understood in 
classical terms. We proceed by first granting quite a broad notion of classicality, describe a range of quantum 
phenomena (such as randomness, discreteness, the indistinguishability of states, measurement-uncertainty, 
measurement-disturbance, complementarity, noncommutativity, interference, the no-cloning theorem, and the 
collapse of the wave-packet) that do fall under its liberal scope, and then finally describe some aspects of 
quantum physics that can never admit a classical understanding - the intrinsically quantum mechanical aspects 
of Nature. The most famous of these is Bell’s theorem, but we also review two more recent results in this area. 
Firstly, Hardy’s theorem shows that even a finite dimensional quantum system must contain an infinite amount 
of information, and secondly, the Pusey-Barrett-Rudolph theorem shows that the wave-function must be an 
objective property of an individual quantum system. Besides being of foundational interest, results of this sort 
now find surprising practical applications in areas such as quantum information science and the simulation of 
quantum systems. 


1. Introduction 

We are constantly told that quantum theory has revolutionized our understanding of the uni¬ 
verse, and reveals a strange new world, radically different from classical Newtonian mechanics - 
cats can be both alive and dead; particles can disappear and reappear behind the moon; spooky 
action-at-a-distance causes instantaneous effects at the other side of the universe; measuring 
one observable disturbs the value of another in a strange, conspiratorial way. But one can press 
the matter beyond the overused lines of newspaper articles and pop-science accounts, and ask: 
which quantum phenomena unequivocally force us to discard long-held, classical conceptions 
of the universe in the same way that a constant speed of light for all local observers forces us 
to discard the notion of absolute time? Which phenomena of quantum theory are intrinsically 
non-classical? 

We might quickly point to things like “wave-functions” and “Hilbert space” [1-3], but these 
are simply technical features of the mathematics of quantum theory, and on their own shed no 
light on how the physics of quantum mechanics radically departs from the classical realm. Indeed 
long ago Koopman and von Neumann showed that classical mechanics itself can be formulated 
in Hilbert space [4-6]. 

Discretization of observables, such as atomic energy levels, does not really present any dramatic 
shift in our perspectives - one can easily define a classical phase space that is discrete in positions 
and momenta with permutations for dynamics. Another answer might be that quantum reality 
is fundamentally probabilistic and that “uncertainty is hard-wired into physics”, but again is 
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this really such a radical departure from classicality? The behaviour of the particles in a hot 
cup of coffee is massively unpredictable, and we have absolutely no chance of describing them 
in any precise sense - should we view quantum physics as an exaggerated form of statistical 
randomness, perhaps in which we now have an irreducibly poor resolution of the complicated 
underlying details? 

Yet another response might be that the collapse of the wave-function allows us to instan¬ 
taneously cause the state of the rest of the universe to change entirely. However consider the 
following scenario: both you and a friend (who lives on the other side of the galaxy) have been 
given a box each. You are both told that one box contains a gold coin while the other box con¬ 
tains a silver coin. Before you open your box, you are completely ignorant as to what is in your 
particular box, and so you can only predict that with probability 1/2 you have gold, and with 
probability 1/2 you have silver. Moreover correlations exist, you know that if you have a silver 
coin then your friend has a gold coin and conversely if you have gold then they have silver. You 
open the box, and to your delight, discover that you have gold. At that same instant you also 
know that the box on the other side of the galaxy must contain silver. An instantaneous collapse 
of the probability distribution has taken place! Is quantum entanglement and the collapse of the 
wave-function simply an exaggerated form of this probabilistic updating? 

The aim of this review is really twofold: firstly to show that the above phenomena are not the 
features of quantum physics that overthrow a classical conception of the world, and secondly to 
identify a range of deeper phenomena that do. 

Of course, in order for us to provide meaningful answers we must commit to some minimal 
notion of classicality. The simple criterion that guides us is the following: 

If a phenomenon of quantum physics also occurs within a classical statistical physics setting, 

perhaps with minor additional assumptions that don’t violently clash with our everyday con¬ 
ceptions, then it should not be viewed as an intrinsically quantum mechanical phenomenon. 

This informal condition provides a standard for how surprised we should be by any quantum 
phenomenon. The term “everyday conceptions” is intentionally vague at this point, and ulti¬ 
mately depends on what the reader deems “classically reasonable”. However the key point here 
is that the more liberal you are with “classically reasonable” then the stricter you are with what 
aspects of quantum theory challenge your classical conception of the world. In what follows we 
adopt a fairly generous notion of “classicality”, or equivalently, we adopt a high standard for 
what we call “intrinsically quantum”^. We start by first fleshing out the above notion of classi¬ 
cality, and then exhibiting a range of quantum phenomena that, on their own, do not seriously 
challenge our classical conceptions. After mapping out these “classical fragments” of quantum 
theory, we then identify those quantum phenomena that forever banish the classical realm. 


1.1. Overview 

This review covers several interrelated facets of the foundations of quantum theory, which at 
times can become quite abstract and subtle. To avoid confusion, we start with a rough outline. 

The purpose of §2 is to show that the quantum phenomena of measurement-disturbance, 
complementarity, randomness, collapse of the wavepacket, and others, also appear in classical 
statistical mechanics supplemented with minor additional assumptions. §2.1 contains the core 
concepts, and is largely self-contained. §2.2 provides a more physical model that makes a direct 
connection to quantum physics. The conclusion of this is that Gaussian quantum physics is 
essentially classical in nature (see Appendix D for a dehnition of Gaussian quantum physics). 
§2.3 and §2.4 analyse the no-cloning theorem, and wavepacket collapse in the Einstein-Podolosky- 
Rosen experiment within this model, showing that they too are essentially classical. 


^Ultimately the formal definition of “classical” will be that the theory is a “local, non-contextual theory in which non- 
orthogonal pure states are represented by overlapping statistical distributions defined on some state space A”. 



August 4, 2015 0:35 Contemporary Physics contemp-phys-review 


3 

Leading on from this, §2.5 and §2.6 discuss what it means for quantum phenomena to have a 
classical statistical model in general. This sets the scene for discussing intrinisically quantum- 
mechanical phenomena. These sections are a bit more abstract and technical, so a casual reader 
should just take the core message from section §2.5 that a probabilistic framework allows us to 
place quantum theory in a broader context, and in doing so contrast it with other theories such 
as classical theory. 

§3 identifies three intrinisically quantum-mechanical phenomena. §3.1 reviews Bell’s theorem. 
This is a mostly self-contained discussion, and can be read on its own with only a rough overview 
of §2. The take-home message is that the correlations obtained from measuring entangled states 
force us into a dilemma: either abandon basic notions of realism or abandon the relativistic 
notion that influences cannot travel faster than light. §3.2 discusses Hardy’s theorem. This 
requires understanding the basic framework of §2.5, and the take-home message is that quantum 
systems display seemingly contradictory properties of being both continuous and discrete and, 
contrary to traditional statements, it is the continuity which is puzzling. Finally, §3.3 discusses 
the recent Pusey-Barrett-Rudolph theorem, which shows that the wave-function must be an 
objective property of an individual system. This requires a little more background from §2.5 as 
well as a basic understanding of §2.6. 


2. Classical Fragments of Quantum Theory 

It is clear that quantum mechanics must accommodate some kind of emergent “classical prop¬ 
erties”, and some kind of a “classical regime”, in which Newtonian mechanics is recovered as a 
limiting case. However our goal here is not to describe a classical limit, but rather to set up a 
sufficiently broad notion of classicality so that anything which does not fall under this notion 
must be deemed intrinsically quantum in character. 

In this section we show that classical fragments exist within quantum theory, i.e. there are a 
range of phenomena in quantum physics that also appear in classical statistical physics supple¬ 
mented with assumptions that do not violently clash with our intuitions about classical physics. 
However, if one tries to stretch this classical framework across all of quantum physics, then “clas¬ 
sically unreasonable” features always appear - quantum physics can never be fit cleanly into a 
classical framework. By carving out these classical fragments and delineating their boundaries, 
we can identify the genuinely non-classical aspects of Nature. 


2.1. A toy classical universe with some odd features 

Let us begin with an extremely simple statistical mechanics example due to Spekkens [7] that 
captures some of the conceptual problems we face in identifying genuinely quantum phenomena. 
Imagine a classical particle that can be in one of four possible states. For example, we might 
imagine a box with four internal cells to it, and the particle is located in one of the cells. For 
simplicity we can imagine that the cells form a 2 x 2 grid, which allows us to label these cells 
via discrete coordinates as (0,0), (0,1), (1,0), (1,1). These are the exact states, or microstates, 
of the system (see Figure 1). 

However, now suppose that the box is so small that all our measurement devices are blunt, 
clumsy probes that only return coarse answers as to where the particle is actually located. 
Instead of precise microstates, we are forced to use statistical macrostates, which are probability 
distributions p = (P(oo))P(oi))P{io))7'(ii))) where P[jk) is the probability that the particle is in 
the cell {j, k). 

So far everything is elementary classical statistical mechanics of a simple system, which requires 
exactly two bits of data, j and k, to specify the microstate of the particle. However, suppose 
we now conjure up a new fundamental law for this toy-universe [7], and make the following 
assumption on our ultimate ability to determine the microstate of the particle; 
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Resolution Restriction (RR-condition): It is impossible to possess more than a single 
bit of information about the microstate of the classical system. 

In other words, the macrostate is allowed to be fully random, p = (1/4,1/4,1/4,1/4), or to have a 
single sharp coordinate, j say, so that p = (1/2,1/2, 0, 0) is allowed. However, it cannot have any 
weaker form of randomness, for example the sharp microstate p = (1,0,0,0) is fundamentally 
excluded from the toy-theory. 

There are six extremal macrostates in the theory, which are the six minimal uncertainty states 
shown in Figure 1. 


a = (1/2, 1/2, 0,0) b = (1/2, 0, 1/2, 0) C = (l/2, 0, 0, 1/2) 


(0,1) 

(1.1) 

(0,0) 

(1.0) 


(j,k) = a microslale of system 
Shaded region — the macrostate C 




C = (0, 1/2, 1/2, 0) 


Figure 1. Extremal macrostates within the toy theory: A shaded cell denotes that the particle is in that cell with 
probability Exactly two bits j,k E {0,1} are required to specify the microstate (j^k) of the system, however if only 
at most one bit about {j,k) is attainable, then there are six minimal uncertainty macrostates (a, 6, c, a, 6, c}, within the 
toy theory. Every other macrostate is a probabilistic combination of these macrostates. Note that each vertical pair of 
macrostates have zero overlap, and so are perfectly distinguishable alternatives. Put another way, each pair defines a single 
answerable “yes-no” question in the universe, and so explains the use of the barred and unbarred notation (e.g. “a”=a and 
“not a” =a.). 

The key thing to note is that while clearly not a fundamental part of classical mechanics, this 
RR-condition does not dramatically overthrow our classical conceptions - it simply describes a 
classical scenario where we have a bound on our resolving power. However despite such simplicity, 
the RR-condition has a range of surprising consequences for the physics of this toy-universe. For a 
start, it implies that the six extremal macrostates of the theory cannot be perfectly distinguished. 
For example, if the system is prepared either in the macrostate a or the macrostate b, and you 
do not know which, then there is no procedure that will tell you which is the case with certainty. 
This is because the distributions a and b overlap - they each assign probability 1/2 to the 
microstate (0,0). Therefore, if the system happens to occupy this microstate, which will happen 
with probability 1/2, then there is nothing you can possibly do to distinguish a from b. This 
parallels the fact that non-orthogonal quantum states, such as the | f z) and | t x) states of 
a spin-1/2 particle [1, 2, 8], cannot be perfectly distinguished. Secondly, the RR-condition not 
only places restrictions on what can be measured, but it also implies that all measurements in 
the toy-universe must uncontrollably disturb the particle. 

Since we want to respect the RR-condition in this toy-universe in the simplest way, it is 
reasonable to assume that any idealized measurement that we can perform obeys the following 
two conditions: 

• Consistency with the RR-condition: Whenever the system starts in a macrostate that 
obeys the RR-condition, it must end up in a macrostate that still obeys the RR-condition 
after the measurement has been performed and we have recorded the outcome. 

• Repeatability: When a measurement is performed and a certain outcome is obtained then 
repeating the measurement immediately afterwards should yield the same outcome [2]. 

Given these two conditions, it is easy to see that there is no measurement that reveals exactly 
which cell the particle is in. Suppose such a measurement were allowed and consider, for exam- 
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pie, what would happen if its outcome revealed that the system was in the (0,0) cell. Because of 
repeatability, the system must remain in the (0,0) cell after the measurement, since otherwise 
there would be some probability of obtaining a different result upon repeating the measurement. 
However, this is incompatible with the RR-condition because it would leave us with full infor¬ 
mation about which cell the particle is in. We conclude that measurements in the toy universe 
must necessarily only reveal coarse-grained information about the microstate. 

Whilst we cannot do a measurement that tells us which exact cell {j, k) the particle is in, it 
turns out that we are allowed do a coarse-grained measurement H, with outcomes a and a, that 
returns one of two answers: 

Outcome A = a: “The particle is in (0, 0) or (0,1)” 

Outcome A = a: “The particle is in (1,0) or (1,1)”. 

Now consider a situation in which the system is prepared in the macrostate p = b = 
(1/2, 0,1/2, 0), namely half the time there is a particle in the cell (0,0), the other half the 
time its in the cell (1,0). 

If we perform the measurement A then we will get the outcome a half the time. Crucially, if the 
measurement does not cause a disturbance to the system, then getting this outcome would allow 
us to conclude that the particle must be in the (0, 0) cell because it had zero probability of being 
in the (0,1) cell to begin with. To avoid this violation of the RR-condition, the particle must 
sometimes get kicked into another cell during the measurement procedure. By repeatability, the 
only cell it can get kicked to is (0,1) because this is the only other cell that gives the A = a 
outcome with certainty. In fact, to satisfy both the RR-condition and repeatability for any valid 
starting distribution, upon obtaining the A = a outcome half the time the particle must remain 
where it is, and the other half the time the microstates (0,0) and (0,1) must be swapped. Thus, 
starting in the macrostate p = b before the measurement, the measurement disturbs the system 
and sends it to p' = a = (1,1,0,0) (see Figure 2). Measurement-disturbance necessarily exists in 
the physics of this toy universe. Indeed the only macrostates that are not disturbed by measuring 
A are the macrostates a and a. 


(Measurement of A) 



Initial raacrostate b macrostate a 


Figure 2. Measurements are always disturbing in the toy-model: The minimal uncertainty macrostate b = 
( 1 / 2 , 0 , 1 / 250 ), which respects the RR-condition. Suppose a measurement A is performed on the macrostate, and ran¬ 
domly outputs A = a: “top left or bottom left” (the dashed green region), then it must necessarily disturb the macrostate in 
a probabilistic way in order to preserve both the RR-condition and repeatability of measurements. The system is updated 
to the new macrostate a = (1/2,1/2, 0, 0). 


In addition to A, there are two other extremal measurements in the toy theory: B distinguishes 
(0,0) and (1,0) from (0,1) and (1,1), while C distinguishes (0, 0) and (1,1) from (0,1) and (1,0). 
Each of these measurements necessarily induces a disturbance of a similar type to a measurement 
of A. The three extremal measurements are illustrated in Figure 3. 

This measurement disturbance is entirely classical, however it leads to phenomena that are 
familiar in quantum mechanics^ - for example, the measurement of spin along the x-axis for 
a spin-1/2 particle disturbs all quantum states except the eigenstates |x t) and \x J,). As it 


^In appendices B-D, we review those aspects of quantum mechanics that are relevant to this article. 














August 4, 2015 0:35 Contemporary Physics contemp-phys-review 


6 



Figure 3. Extremal Measurements: The 3 sharpest possible measurements permitted within the toy-universe are denoted 
A, B, C. Each measurement has two probabilistic outcomes, with each outcome resulting in the preparation of one of the 6 
minimal uncertainty macrostates (within the dotted boxes). For example, performing the measurement A on the completely 
random macrostate (^, |) prepares the macrostate a when we get the outcome A = a, and prepares the macrostate a 

when we get the outcome A = a. Each of these outcomes occurs with probability 


turns out, the classical measurements A,B,C on macrostates {a,b,c,a,b,c} have probabili¬ 
ties and disturbance patterns that perfectly mimic the three quantum spin measurements along 
the X, y, ^-directions performed on the 6 different eigenstates {|s t); I'S f)} for s = x,y,z. The 
macrostates in the toy-model also display a form of complementarity in terms of the 3 mea¬ 
surements A, B and C. For example, the macrostates a and a have deterministic outcomes for 
measurement A, however the outcomes of measurements B and C are fully random on these 
macrostates (see [ 2 ] or [ 8 ] for careful discussions of complementarity in quantum theory). 

The disturbance induced by measurements also implies that the results obtained in a sequence 
of measurements depend on the order in which the measurements are made. Such a dependence is 
often taken to be a signature of non-commutativity in quantum mechanics, so in this sense we can 
say that the classical measurements in the toy model are non-commutative. For example, if we 
make two A measurements in a row, one immediately after the other, then, by repeatability, we 
will get the same result both times. However, if B is measured between the two A measurements, 
then the disturbance it induces will cause the outcome of the second A measurement to be totally 
random, and uncorrelated with the first A measurement. 

It is vital to emphasize that this example is most definitely not trying to show that quantum 
theory is actually some funny classical model, but simply that the phenomena of measurement 
uncertainty and measurement disturbance in quantum physics also arise within a classical model 
with simple additional assumptions (a bound on the sharpness of classical measurements) that 
do not overthrow our classical conception of the world. Therefore, according to the previously 
stated notion of classicality, measurement uncertainty, complementarity, noncommutativity, and 
measurement disturbance are not viewed as intrinsically quantum mechanical phenomena, and 
so we must search deeper. 

Because the 3 sharpest measurements, on the 6 extremal macrostates turn out to have iden¬ 
tical statistics to elementary quantum states and measurements, the toy-theory also mimics 
interference - in spite of it being a classical statistical theory. At the simplest level, interference 
is our ability to reversibly evolve a quantum state such as |T) = (| 0 ) -|- \l))/y/2 to another 
state |T') = | 0 ), where {| 0 ), | 1 )} is an orthonormal basis for a 2 -dimensional, qubit system (see 
appendix C for details). The evolution acts linearly on |T) and is defined by the transformations 
|0) 1 -^ (|0) + \l))/y/2 and |1) (|0) — \l))/y/2. Under such an evolution the |0)-component of [T) 

is enhanced (constructive interference), while the |l)-component of |T) is eliminated (destructive 
interference) [1, 8 ]. If we were only keeping track of the measurement statistics of measuring in 
the basis {| 0 ), | 1 )}, then the above interference would be described by the reversible evolution 
of measurement statistics (|, 5 ) to/from ( 1 , 0 ). 

Such behaviour is perfectly mimicked within the classical toy model. For example, the minimal 
uncertainty macrostate a gives rise to (1/2,1/2) outcome statistics for the B measurement, but 
it can be converted in a reversible way (deterministically shuffle the cells around) into any of the 
other minimal uncertainty macrostates. The reversible transformation of the cells (0,1) 0 (1,0) 
transforms a into b and hence, the measurement statistics of B go from (1/2,1/2) to (1,0). If we 
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simply look at the measurement statistics, this is indistinguishable from the quantum example of 
interference described above. Again, it is important to note that we are not claiming that general 
quantum interference is classical, merely that a similar phenomenon can appear in the reversible 
dynamics of classical statistical models and so a simple notion of “interference” is perhaps too 
blunt an answer to the question of what quantum phenomenon is intrinsically non-classical, and 
so requires more precision. 

Remarkably, the above classical toy-model exhibits an array of other phenomena traditionally 
associated with quantum mechanics, such as the collapse of the wave-packet, state teleportation 
[9] and the impossibility of cloning states [10, 11]. All the striking phenomena stem solely from 
a single restriction on classical resolving power. Instead of describing all these within the toy- 
model, we shall upgrade the basic RR-condition to obtain a more natural, and intuitive scenario 
where these phenomena are more vivid and generate a genuine classical fragment of quantum 
theory. 


2.2. Gaussian states and operations are a classical fragment of quantum physics 

The toy model of a classical particle in a box, subject to a single constraint on resolving power, 
leads to a mimicking of phenomena that are traditionally deemed quantum mechanical in char¬ 
acter. However, this simple classical model can now be magnihed into a more surprising result 
that has direct contact with quantum mechanics [ 12 ]. 

Consider the phase space of a classical system [13], parameterized by variables {x,p), but now 
imagine discretizing the phase space into boxes with sides of length L and imposing a Resolution 
Restriction condition such that our Liouville distributions can never have smaller support than 
some limited number of boxes over the phase space. Such a simplistic approach would be clunky 
in that it would depend on an arbitrary way of partitioning phase space. To get around this 
we can instead impose an RR-constraint at the level of the expectation values of canonical 
coordinates. When we do so, we find that the resulting physical theory makes exactly the same 
predictions as Gaussian quantum mechanics^. 

For simplicity, we restrict attention to a single classical particle moving in one spatial dimen¬ 
sion, but the construction can easily be generalized to any classical system. The particle has 
microstates (a:,p) G that make up the system’s phase space. A statistical Liouville distri¬ 
bution is then a probability distribution / on the phase space with f{x,p) > 0 and such that 
Jdxdp f{x,p) = 1 . From this we can compute the statistical properties of the system, such as 
the expectation value of position (x) = Jdxdp [/(x,p)x], or the expectation value of momentum 
(p) ■= Jdxdp[f{x,p)p]. 

In classical mechanics there is no limit on how sharp the predictions of / can be - we can know 
the precise microstate to any degree of accuracy. An RR-constraint can be imposed by limiting 
how small the fluctuations about the mean ((x), (p)) can be. The form that this constraint 
should take is motivated by the symmetries and structure of phase space, e.g. we want the 
constraint to be preserved by classical time evolution of the system. Since classical dynamics 
may cause fluctuations in x to be transferred into fluctuations in p and vice versa, and may 
induce correlations between x and p, it is natural to impose the constraint on fluctuations in 
X, p, and their correlations taken all together. To do this we construct a fluctuation matrix 7 , 
which is given in terms of position and momentum by 


7 = 


(Ax)^ (xp) — (x)(p) 
(xp) — (x)(p) (Ap)^ ’ 


( 1 ) 


where (Ax)^ = (x^) — (x)^ and {Apfl = (jp) — (p)^ are the variances of the position and 
momentum in the particular Liouville distribution /. 


^We give a short account of Gaussian states and operations in appendix D. See [14] for a more in depth review. 
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Using the matrix 7 , we can define an RR-condition that restricts how small the fluctuations 
in / can get. An elegant way to do this is to demand that 7 obey the matrix equation 

7 + AC7>0, ( 2 ) 


for some 2 x 2 matrix C*, and some constant scale A > 0 that measures the size of the “boxes” on 
phase space, where Eq. (2) means that the eigenvalues of 7 RAC are all > 0. Since the eigenvalues 
of 7 itself are always > 0, the case A = 0 corresponds to switching off the RR-constraint, and so 
A controls the level of fluctuations within the toy classical universe. 

The question of which matrix C to use is more subtle, but we want to choose it such that 
Eq. (2) is preserved under classical time evolution, i.e. if it holds at time t then it should also 
hold at ant time t' > t under any dynamics allowed by classical mechanics. It turns out that 
setting C = iS, where i is the imaginary unit and 


S = 


0 -1 
1 0 ’ 


(3) 


does the trick, so the final RR-condition is 

7 -|- zAS > 0 , 


(4) 


for some fixed minimal resolving scale A on the classical phase space. See appendix A for more 
details of this construction. 

Einally, because we are following a statistical mechanics account of the physics [15, 16], for a 
given fluctuation matrix 7 , we use the Gibbsian distribution / that maximizes the thermody¬ 
namic entropy S = — Jdxdp f{x,p) log f{x,p). Thus the scenario we have described is precisely 
one of classical statistical mechanics where our classical resolving power is bounded in phase 
space by a scale A. Indeed, Eq. (4) implies that AxAp > A, and so we find that the RR-condition 
encodes a classical uncertainty relation on the statistical system. 

The RR-condition can be interpreted as a kind of externally imposed complementarity be¬ 
tween position and momentum for the classical system, and, as with the previous toy model, 
we have measurement-disturbance; localizing the position of the particle must randomly disturb 
its momentum in order to maintain repeatability and the RR-condition. Once again, the theory 
displays a type of “interference” in terms of its macrostates in the sense described earlier, and 
the structure of the extremal macrostates of the theory is surprisingly rich. 

To what degree does this mimic quantum mechanical phenomena? What classical fragment of 
quantum theory does this model define? The answer is perhaps surprising ~ the above scenario, 
in which we use classical statistical mechanics under the RR-condition Eq. (4), is precisely 
isomorphic to Gaussian quantum mechanics [12, 14] (see appendix D for a definition of Gaussian 
quantum mechanics). In other words. 

If Nature only prepared Gaussian quantum states, and only performed Gaussian evolution 
and Gaussian measurements, then classieal statistieal mechanies with a single resolving 
constraint (with X = ^h) would perfectly reproduce all physieal predictions. 

The proof that all of the features of Gaussian quantum mechanics are reproduced is an involved 
computation, and we refer the reader to [ 12 ] for the details. 

Thankfully, there more to life than Gaussian physics, but this correspondence still tells us some 
useful things. Eirstly it shows that Liouville mechanics under the RR-condition reproduces all 
quantum phenomena that are present in Gaussian quantum mechanics [14]. This includes tele¬ 
portation, superdense coding [17], remote steering [18], secure-key distribution [19], no-cloning 
[10, 11], and the collapse of the wave-packet (we discuss some of these shortly). However, the 
real value of the result is that it narrows our hunt and tells us that the intrinsically non-classical 



August 4, 2015 0:35 Contemporary Physics contemp-phys-review 


9 

phenomena, such as Bell non-locality [20, 21], quantum computation [22] and contextuality [23], 
must necessarily be non-Gaussian in nature. 

It should also be emphasized that in a precise sense there is no “middle-ground” between 
Gaussian quantum mechanics and the full set of quantum operations. Specifically, the set of 
unitary transformations that describe the dynamics in Gaussian quantum mechanics are all those 
of the form where the Hamiltonian H is at most quadratic in the canonical coordinates 

q and p. Now imagine that we have a single non-quadratic term if* that can be added to the 
Hamiltonain and which can be switched on or off whenever we want. By adding this single if* to 
the set of quadratic Hamiltonians, the set of unitary transformations we can achieve explodes to 
become the full set of unitaries on the Hilbert space [24]. This means that the Gaussian fragment 
is in a sense the largest classical fragment we can obtain by following this line, and must rest 
right up against genuine non-classicality. 


2.3. No-cloning is a classical statistical phenomenon 

It is often maintained that the impossibility to clone quantum information is a distinctly quantum 
mechanical phenomenon. Formally, the no-cloning theorem [10, 11] says that it is impossible to 
construct a physical device that, on input of any quantum state |T) will return the duplicated 
state [T) (g) (T). Indeed, if such a magical device existed then one could even violate relativity 
and signal faster than light 

The proof of the no-cloning theorem in quantum theory is very straightforward. Suppose a 
device existed that could clone two nonorthogonal and nonidentical states I'h), |<h) G where 
Us is the Hilbert space of the system. Any physically allowed transformation in quantum theory 
is described by a unitary operation U on the joint Hilbert space T-Ls ^T-ia of the primary system 
and some apparatus system, which together form a closed system. For a device that clones [T) 
and |4>), this transformation must satisfy 

f/(|T)G) |S)) = |T)G) 1^) (5) 

[/(|4>)®|S)) = |<h)®|ch), (6) 

where |H) is the initial state of the apparatus. We can now compute the inner product of the 
output states [/]'!') ® |H) and C/|^) (g) |H) in two different ways. Firstly, using Eq. (5), we have 

|(4>| ® {E\U^U\^) ® |S)| = |(4>|T)(4>|T)| = |(ci>|T)|2. (7) 

Alternatively, we can use the fact that U is a. reversible unitary evolution and so = 1, where 
1 is the identity operator, to give 

1(^1 ® {E\u^um ® |H)| = |(<i>|T)(H|H)| = |(4>|T)|, (8) 

where the last line follows from the fact that |H) is a unit vector. In other words, a unitary 
transformation preserves inner products. Equating the two expression gives [(^IT)] = |(<1>|'I')|^, 
which is only satisfied when |(4>|'I')| is either 0 or 1. However, since we assumed that ]'!') and 
]<!>) are nonidentical and nonorthogonal, this is a contradiction, and thus there is no physically 
allowed cloning device in quantum theory. 

Whilst classical bits can be freely copied, this only applies to the exact values of the bits 
themselves. In light of the theories presented so far, it is perhaps better to think of a pure 


^The rough idea is that if Alice repeatedly clones one half of an entangled state that has been collapsed by a remote 
measurement made by Bob on the other half of the state, then she can magnify the information as to what type of state she 
possesses (e.g. whether it is a momentum eigenstate or a position eigenstate). Bob can use this to signal faster than light by 
choosing which type of state to collapse Alice’s system to (e.g. momentum eigenstates= “yes”, position eigenstates=“no”). 
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quantum state as analogous to a probability distribution over the values of classical bits, and 
these cannot be cloned. We find that the imposition of the RR-condition on the above classical 
statistical model generates precisely the same prohibition as in quantum theory; it is impossible 
to build a cloning device within this classical theory. 

The proof parallels the quantum proof, and relies only on a basic property of Hamiltonian 
dynamics [ 12 ]. Suppose a classical device existed that could clone two overlapping but noniden¬ 
tical Liouville distributions fs{x,p) and gs{x,p) defined on a classical system S. Let C be a 
second classical “apparatus” of equal size to S that is initialized in some fiducial macrostate 
hc{x',p'). The composite system SC undergoes some Hamiltonian dynamics on the underly¬ 
ing microstates, which is assumed to clone fs and gs- The initial joint state is thus either 
fsc{x,p;x',p') = fs{x,p)hc{x',p') or gscix,p;x',p') = gs{x,p)hc{x',p'), where fs and gs 
are the input macrostates of the system, and fsc and gsc are defined on the joint phase space 
X If this evolution is a cloning process then, under the Hamiltonian dynamics, we must 

have 


fs{x,p)hc{x',p') —^ fs{x,p)fcix',p') (9) 

9 s{x,p)hc{x,p') —> gs{x,p)gc{x',p'), (10) 

where fs and fc are identical distributions, and similarly for gs and gc- 
To mirror the inner product computation of quantum states, we use a classical measure [25] 
of how much two statistical distributions overlap. The only fact we need about the dynam¬ 
ics is the following: if / and g are two distributions on phase space, then the overlap integral 
F{fi9) = / d®dp^ f{x,p)\/g{x,p) is constant in time. Here, F is the classical “fidelity” mea¬ 
sure, quantifying the degree to which the distributions overlap - f = g then they overlap fully 
and T = 1, while if they have zero overlap then F = 0. Put another way, Hamiltonian dynamics 
evolves phase space distributions in a volume-preserving way, similar to an incompressible fluid, 
which implies that the fidelity is also preserved. 

Now, we can compute the overlap of the two final states in two different ways. Firstly, using 
Eq. (9), we have 


Fifsfc, 9s9c) = F{fs, 9s)F{fc, gc) = F{fs, gs)"^■ (11) 

However, since Hamiltonian dynamics preserves fidelity, we can alternatively compute the fidelity 
of the initial states, which is 

F{fshc, 9shc) = F{fs, gs)F{hc, he) = F{fs, gs)- (12) 

In parallel to the quantum case, these two equations can only be satisfied if F{fs,gs) = 0 or 
1 , but, since we assumed fs and gs are overlapping and non-identical, this is a contradiction. 
We conclude that it is fundamentally impossible to construct a device that clones overlapping 
statistical distributions within the classical theory. Therefore the no-cloning theorem does not 
only apply to quantum theory, but also to classical statistical mechanics. Hence, it should not 
be considered an intrinsically quantum mechanical phenomenon. 


2.4- The EPR argument and the collapse of the wave-packet 

The seminal 1935 paper [26] by Einstein, Podolsky and Rosen asked whether quantum theory 
is “complete” or “incomplete”. In other words, perhaps quantum theory is only a stop-gap and 
there is a yet deeper theory, in which “God does not play dice”. Their analysis revolved around a 
two particle state, which displays correlations between both the position and momentum degrees 
of freedom. 
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It turns out that the EPR state and the measurements they considered actually lie within the 
classical fragment we have just described - in other words, at least as far as the EPR argument 
and measurements are concerned, the answer is a tentative “yes”: a perfectly deterministic 
and local completion does exist that reproduces all the statistics of the EPR experiment. It is 
only when we get to Bell’s theorem that entanglement prohibits any such underlying degrees of 
freedom that only interact locally, and thus the classical picture of the world becomes untenable. 

The EPR two-particle state is correlated in the two position degrees of freedom and has zero 
total momentum. Specifically, in the position representation, the state is 


X2)2, (13) 

where 5{x) is the Dirac delta-function distribution and c is a constant. In the momentum repre¬ 
sentation the same state is 


'I'i,2) = 


J dxida ;2 5{xi — X 2 + c)|xi) 


<ipidp2S{pi+P2)\pi)l\p2)2- (14) 

Such a state might be obtained through the decay of some massive particle with zero momentum 
into two lighter particles that propagate in opposite directions and are now a distance c apart 

Now, according to the orthodox account of quantum theory (i.e. the textbook treatment of 
quantum mechanics), the state |4'i^2) represents a situation in which neither particle has a 
definite position, since it is not an eigenstate of either of the operators xi or X 2 - Similarly, 
neither particle has a definite momentum. All we can say is that the system is in a state of definite 
relative position and momentum, i.e. there is perfect correlation between the two positions and 
momenta. Note that, in the orthodox account, it is not simply a matter of each particle having a 
definite position and momentum that is currently unknown to us, but rather that the individual 
positions and momenta do not exist, since the only properties that can be ascribed to the system 
are those corresponding to operators of which the state is an eigenstate. 

The EPR argument amounts to the observation that if one measures the position of the first 
particle to be x then the state of the remotely separated second particle is collapsed to a sharp 
position state |x-|-c) 2 , which, if measured, will always yield the value x-hc with certainty. Thus, 
according to the orthodox account, the position of the second particle pops into existence as 
soon as the first is measured and, since c is arbitrary, they may be arbitrarily far apart. This 
represents a kind of nonlocality in the orthodox account, since an observation made over here 
can cause something to instantaneously pop into existence very far away. The only way to avoid 
this is to “complete” quantum mechanics by positing that, in fact, the second particle did have 
a definite position before the first was measured, and this is exactly what EPR argued for. 

The same argument can also be run in momentum space. If one chooses to measure the 
momentum of the first particle and finds it to be p, then the state of the second particle is 
collapsed to a sharp momentum state | — p) 2 - Thus, according to the orthodox account, the 
momentum of the second particle pops into existence upon measuring the momentum of the 
first, and so EPR argued that the second particle must have a definite value of momentum prior 
to measurement in order to avoid nonlocality. It is rather striking that, on the orthodox account, 
the choice of which observable to measure affects which property pops into existence at a distant 
location, and that the completion of quantum mechanics proposed by EPR would violate a strict 
interpretation of the uncertainty principle. Note that this extra wrinkle on the argument is not 
required to establish the nonlocality of the orthodox account, which already follows just from 
considering position measurements on their own. 



^Note that, although the delta functions make this state unphysical, the same argument can be run with properly normalized 
Gaussian states that approximate them [12]. We use the idealized version for simplicity. 
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It turns out that the statistical model [12] with restricted resolving power, can exactly re¬ 
produce the EPR measurement statistics for position and momentum since they are Gaussian 
measurements. Within this parallel model the “paradoxical” or potential conflict with relativity 
is illusory and no longer a concern. To represent the state given in Eqs. (13, 14) in the classical 
scenario, we define the two-particle distribution on the phase space given by 

f{xi,pi;x 2 ,P 2 ) = -X 2 + c)6{pi +P 2 ), (15) 

which is the limit of a sequence of Gaussian distributions that all satisfy the RR-condition. This 
distribution has perfect correlations between the spatial and momentum degrees of freedom of 
the particles. The account of the EPR experiment now takes on a simple form: the actual state 
of the particles are microstates of definite position definite momentum. When we perform a 
local measurement that determines the position of the first particle it is simply the probability 
distribution for the total system that changes, and not the physical state of the remote system. 
This is no different to the example of the correlated coins that was provided in the introduction, 
and highlights that in many ways the collapse of the wave-packet is no more strange than the 
updating of a probability distribution; the objective state of the remote system remains the 
same, and there is no conflict with relativity theory. It requires a stronger result such as Bell’s 
theorem to fully and conclusively rule out an account along these lines. 


2.5. Is analogous to a thermodynamic Gibbs state? 

Some people might claim that quantum mechanics is not the final theoretical framework for 
physics - that there might be some even more fundamental theory yet to be discovered, which 
includes quantum theory as some kind of limiting case. Gould it really be that quantum theory is 
incomplete [26], and that there are underlying variables that give a more fine-grained description, 
and that our measurement devices actually respond to these underlying variables? Historically 
[15, 27], this is what happened with thermodynamics and statistical mechanics - the macroscopic 
properties of heat, temperature and pressure are well-defined properties obeying the laws of 
thermodynamics, however they admit a statistical mechanical description in terms of the rapid 
motion of underlying variables - atoms. How do we know that something like this could not 
happen in the future with quantum mechanics? Gould the wave-function be more like the Gibbs 
distribution of statistical mechanics, and somehow point to new underlying degrees of freedom? 

We shall see that this can never happen. In a precise sense, and under a broad range of entirely 
reasonable assumptions, no such thing can happen in any future theory of physics. To tackle 
such a seemly nebulous question, we must use a sufficiently general framework that contains only 
the most primitive notions of “states” and “statistical measurements”, and which can account 
for the predictions of quantum theory as a special case. Since the framework we describe can 
account for the predictions of quantum physics, as well as theories which are not quantum theory, 
the framework effectively allows us to regard quantum mechanics as an object in itself, and to 
delineate its properties in contrast to other theories, including classical mechanics. 

According to the textbook account [1, 3], the quantum state j'k) provides a full description 
of a quantum system, both in terms of its subsequent evolution in time and how it responds to 
any measurement that we may wish to perform. Every orthonormal basis of the Hilbert space H 
corresponds to a quantum measurement, and has outcome probabilities given by the Born-rule. 
A qubit is any quantum system whose Hilbert space is two-dimensional and so any state of 
the system is expressible as j'h) = ajO) -|- /3jl) for an orthonormal basis {[0), jl)} and complex 
numbers a,/3 G C obeying -|- j/?j^ = 1. A general orthonormal basis contains two states 
{j<h), ]$)}, and when a measurement is performed in this basis on a system prepared in the state 
jd'), then the outcome probabilities are simply given by [(^•j'h)]^ and j(4>j'I')j^. 

Given this framework, what would it mean for the quantum state j'l') to be like a Gibbs 
state and admit some hypothetical microscopic structure? This would require that there exists 
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some set A of perhaps more fundamental states that give a sharper description of the system 
[28, 29]. If this did turn out to be the case, then instead of using |'I') to describe the physics 
we could replace it with a probability distribution over this set of variables A. More precisely, 
each quantum state j'h) would be associated with a probability distribution^ p(-|'I') ; A —)■ M for 
which p(A|'I') > 0 for any point A G A and which is normalized as 


dAp(A|'h) = 1. 


In particular, this means that the integral 


(16) 



dAp(A|^') = P{R\^) 


(17) 


is the probability that the underlying microstate state of the system is in the region R of the 
full state space A when the state I'h) is prepared experimentally. 

Once we have a notion of quantum states being described by probability distributions, how do 
we describe quantum measurements? This too is relatively easy. Suppose we perform a measure¬ 
ment in the basis {|0), |1)} on a qubit prepared in the quantum state I'h) = cos 0|O) -|-sin0|l). Ac¬ 
cording to quantum theory, the probabilities of the two outcomes are go = cos^ 9 and qi = sin^ 6. 
However, in terms of the underlying variables, the measurement is described by a conditional 
probability distribution p{j\X), where p{j\X) is the probability that the measurement will return 
the outcome when the system occupies the microstate A. This means that we must have 
p{j\X) > 0 for all A G A, and also p(0|A) -|-p(l|A) = 1, so that the probabilities sum to one. 
If these functions are to correctly describe the observed measurement statistics then they must 
obey: 


dXp{j\X)p{X\'i>) = qj (18) 

Ja 

for j = 0,1. 

More generally, for systems of arbitrary dimension, a measurement in the basis 
{|<I>o)) l^i)) • • • performed on a system prepared in the state jT), is described by a con¬ 

ditional probability distribution p(<I>j|A) over the n outcomes that obeys 


/dAp(<h,|A)p(A|T) = |(<h,|T)|2 (19) 

JA 

for all j = 0,..., n — 1. This is what would be required in any hypothetical theory in order for 
the quantum predictions for measurement outcomes to be reproduced. 

It is vital to emphasize that this formulation includes the orthodox description of quantum 
mechanics, and so it is a broader framework that allows questions to be posed that are impossible 
within the traditional setting. To show that it includes the orthodox account, simply take A to 
be the set of all quantum states A = Q = {|'I')}, where states that differ by a global phase are 
identified, and let p(A|'I') be a delta function distribution p(A|'k) = 5{X — T), with weight just 
on the quantum state |T) that is prepared. The measurement conditional probabilities are then 
simply be taken to be p(‘hj|'I') = |(<hj|T)p, to trivially recover the Born-rule. 

Why then should we bother to use such a description? The reason is that since this is a general, 
probabilistic setting that only requires the notion of abstract states A G A and probability 


^Even more precisely, probability distributions should be associated to the procedures for preparing quantum states rather 
than the states themselves to account for a subtlety called preparation contextuality [50]. However, this subtlety does not 
affect any of the results presented here. See [30] for a more rigorous treatment that does deal with this topic. 
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distributions, it uses only the most elementary notions of what one normally calls a “physical 
theory”. Such breadth makes it powerful, and will allow us to rule out alternative theories, 
identify intrinsically quantum phenomena, and to study quantum theory “from the outside”. 


2.6. Overlapping distributions: The Kochen-Speaker model 

In the orthodox account of quantum theory in which p(A|^) = (5(A — 'h), the distributions 
corresponding to different quantum states do not overlap - they are simply delta functions 
located at the different quantum states. However, for a qubit, there is a slightly more interesting 
representation, due to Kochen and Specker, which can be used to frame a fundamental question 
concerning the objectivity of the wave-function in quantum mechanics [23, 28]. 

An arbitrry quantum state of a qubit can be written as 

1^) =cos(0 |O)+e*‘^sin(0 |1), (20) 

where 0 < i? < tt and —tt < < vt. By dehning the vector = (sini?cos(/?,sindsine/?,cos'd), we 

see that the set of states of a qubit can be represented as points on the unit sphere 5^, which 
is known in this context as the Bloch sphere (see appendix C for further details). There is a 
one-to-one relation between every qubit state jT), and a vector on the surface of the Bloch 
sphere and, in what follows, we simply use this vector to represent the quantum state. Moreover, 
if a state is measured in a basis {1*5), |$)} then the probability of obtaining the |$) outcome 
is 


|(d>|^)|2 = i(l + $.T^). (21) 

Kochen and Specker’s model for a qubit employs the unit sphere A = 5^ as its space of 
microstates, and to every quantum state |'k), they associate a probability distribution 

p{X\^) = (22) 

TT 

where 0 is the Heaviside step function 

0w={(: 

In the Bloch sphere representation, this means that p(A|'I') is only nonzero if the angle 9 between 
T' and A is less than 7r/2, and it takes the value cosd on this hemisphere. See Figure 4 for an 
illustration of these probability distributions. 

To represent measurements, a basis vector |<I>) is represented by the conditional probabilities 
p($|A) = 0(A- $), which describes the probability of getting the outcome |<h) when the exact 
microstate is A. On the Bloch sphere, this means that the outcome will be |<h) if the angle 
between A and $ is < tt/2 and will be the orthogonal basis state otherwise. A direct calculation 
(e.g. see [30]) shows that the model yields 

H<h|'k) = JdXp{^\X)piX\^) = |(4>|^)|2, (24) 

and so does in fact reproduce the Born rule for a qubit system. 

Why is this construction of interest to us? Firstly, the Kochen-Specker construction exactly 
reproduces the Born-rule for a two-dimensional system and so can be viewed as mini classical 


X > 0 
X < 0. 


(23) 
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Figure 4. The Kochen-Specker construction: The set of states within the Kochen-Specker construction for a single 
qubit is simply the surface of a sphere. We plot p(A|’Fi) and p(A|4'2) (the peaked, shaded caps with axial symmetry) for two 
non-orthogonal, pure quantum states l^'i) and each of which is specified uniquely by their respective Bloch vectors 

and ^ 2 - The probability distributions overlap, and so a microstate in the overlap-region (denoted A* in figure) can 
equally be associated to either quantum state. If such a phenomenon were to occur for dim('H) > 2 within some future 
theory, then the wave-function need not be an intrinsic property of an individual physical system. 


fragment of quantum theory. However, the real reason it is of interest is that if we plot the 
distribution functions for two different states we notice a distinct feature of the representation 
- two different quantum states j^'i) and |'h 2 ) can have overlapping distributions. The core 
significance of this is that there is a region of microstates (the light-shaded part of Figure 4) 
that belong to both p(A|'hi) and p(A|'h 2 )- If the system occupies the microstate A*, which lies 
in this overlap region, then a unique wave-function cannot be associated to it. In other words, 
within such a hypothetical model a quantum state IT) would not be an objective property that 
is “carved into” the physical system! 

Why might we want the distributions corresponding to different quantum states to overlap? 
Recall from §2.1, that the extremal states of the toy model overlap, as do the distributions in the 
restricted Liouville mechanics presented in §2.2. This overlap naturally explains why the extremal 
states cannot be perfectly distinguished, and why distributions cannot be cloned. Essentially, 
if two preparation procedures sometimes lead to the exact same microstate, then the action 
of any physical device cannot depend on which of the two preparation procedures was used 
whenever a microstate in the overlap region happens to be occupied. Therefore, the probability 
of successfully distinguishing or cloning macrostates is limited by the probability assigned to the 
overlap region. The overlapping distributions in the Kochen-Specker model explain why qubit 
states cannot be perfectly distinguished or cloned in the same way within this model. 

This raises the question of whether such overlapping distributions could ever occur in some 
future theory for general quantum systems, and, if they could, then how much would we have to 
contort our perspective of the world? If we are necessarily forced to adopt something ridiculous 
or bizarre then we would conclude that this is not the case - the quantum state must indeed be 
carved into physical systems, and must be an objective label of a quantum system. This would 
show that the explanations of quantum phenomena in terms of overlapping distributions, despite 
intuitive appeal, must in fact be wrong. 


3. Genuinely Non-Classical Aspects of Quantum Physics 

The previous results show that some fragments of quantum theory can be reproduced in ways 
that do not violently clash with our classical intuitions, but it is clear that we were being led into 
more and more contrived models in order to capture more and more of the phenomena of quan¬ 
tum theory. The take-home message so far is simply that many commonly touted phenomena 
- intrinsic randomness, complementarity, measurement-disturbance, no-cloning, collapse of the 
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wave-packet, etc. - do not in themselves dramatically challenge our classical notions, as they al¬ 
ready appear within theories such as the statistical mechanics model with the RR-condition pre¬ 
sented above. In addition, we have shown that the physics of a two-dimensional quantum system 
can be reproduced by a statistical model in which the probability distributions associated to dif¬ 
ferent quantum states can overlap. This raises the question of whether the quantum-mechanical 
wave-function is necessarily an objective property of a quantum system. 

We now describe results that reveal fundamentally non-classical phenomena, in the sense that 
any classical account underlying the quantum phenomena must be rather contorted. We begin 
with the seminal result of John Bell [20], and then move onto two more recent results that provide 
further insights into the strangeness of quantum theory. The first of these is Hardy’s theorem [32], 
which shows that the set A of microstates must be infinitely large, even for a finite dimensional 
system, and hence that such systems must contain an infinite amount of information. The second 
is the Pusey-Barrett-Rudolph theorem [34], which shows that, under reasonable assumptions, 
the quantum state must be an objective property of an individual quantum system. 


3.1. Bell’s theorem: Quantum physics violates local causality 

The departure of quantum mechanics from classicality was put into a very sharp and powerful 
form by John Bell [2, 20], who showed that some aspects of quantum entanglement can never 
fit into a model in which systems possess objective properties prior to measurement and that 
also obeys a principle of locality. Since the result only depends on certain empirically observed 
predictions of quantum theory, rather than the structure of the theory itself, any future theory 
beyond quantum theory will be subject to the same argument, so there can be no going back to 
a conception of the world that is both classical and local. 

The version of Bell’s theorem we present here is due to Clauser, Horne, Shimony and Holt 
[21], and is the one most commonly used in experiments. To understand it, we need to explain 
both the mathematical components and the physical concepts. To avoid confusion, it is helpful 
to separate these two parts, so we begin with the mathematics. 

The easiest way to understand the mathematics of Bell’s theorem is in terms of a cooperative 
game, in which Alice and Bob are playing as a team against Charlie. Suppose that Alice and Bob 
are captured and held captive by Charlie. Alice and Bob are told that the next morning they 
will be placed into two separate interrogation rooms with no possibility of communicating with 
each other. They will each be asked one of two possible yes/no questions. We call the question 
that Alice gets asked x and the question that Bob gets asked y. For definiteness, we can imagine 
that both Alice and Bob’s questions are labelled 0 and 1, so that x and y are binary variables 
that take values 0 or 1. Let Ax be the answer that Alice gives to question x and let By be the 
answer that Bob gives to question y. To be released they must get their stories straight in a 
very particular way. If they are both asked question 1 then their answers, (yli,Ri), must obey 
Ai / Bi. In all other cases, i.e. if {x,y) = (0,0), (0,1) or (1,0), they must provide answers for 
which Ax = By. 

Alice and Bob are told that they can spend the night together to discuss their strategy for 
answering the questions. They also have access to devices for generating classical randomness - 
for definiteness suppose they have a set of dice with different weightings - which they may use to 
determine their strategy and which they may also bring into the interrogation room with them. 
They are assured that Charlie will not eavesdrop on their discussions and, in fact, that he will 
choose which questions to ask completely randomly by flipping two separate coins. What is the 
best strategy for Alice and Bob to adopt that gives them the highest chance of being released? 

To begin with, let’s ignore the possibility of using randomness and ask what is the best that 
Alice and Bob can do if they employ a deterministic strategy, i.e. they simply have to decide, in 
advance, which answer Alice will give to her question and which answer Bob will give to his. It 
is helpful to represent the target answers as a graph where the answers Aq,Ai,Bq and Bi are 
four vertices [35-37], and we connect the variables that should be equal by a solid line and those 
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that should be different by a dotted line (see Figure 5). 



Figure 5. A Frustrated Network: The winning answer patterns for Alice and Bob. A line denotes that Alice and Bob 
should try to give the same answer, while a dotted line denotes that Alice and Bob should try to give opposite answers. It 
is impossible to satisfy all 4 constraints at the same time, but we can satisfy 3 of them. 


Now, let’s traverse the links in the graph and try to satisfy as many of the requirements as 
possible. Suppose we start by setting Aq to “yes”. Following the top solid line, we see that Bq 
should equal Aq, so it should also be assigned “yes”. The diagonal solid line connecting Bq to 
Ai then implies that Ai should also be “yes”. The dotted line from Ai to Bi implies that Bi 
should be different from Ai so we set Bi = “no”. However, now we have a problem because the 
diagonal solid line from Bi to Aq implies that these two should be equal, but we have already set 
Aq to “yes” and Bi to “no”, so we have only managed to satisfy three of the four requirements. 
Therefore, using this strategy, Alice and Bob will win if Charlie picks any of the question pairs 
(0,0), (1,0) or (1,1), but they will lose if he picks (0,1). Hence, their probability of winning is 
3/4, since the chance of Charlie picking (0,1) is 1/4 if he chooses his questions by two fair coin 
flips. 

It is fairly easy to see that, however Alice and Bob assign “yes” and “no” to their questions, 
they can only satisfy at most three of the four requirements. This is because, however you go 
about traversing the graph and assigning answers, you can never satisfy the final requirement 
encoded in the final link because it contradicts the implications of the other three. Thus, 3/4 is 
the largest probability with which Alice and Bob can win the game via a deterministic strategy. 

This is the basic mathematics of Bell’s theorem, but we still have to deal with the possibility of 
probabilistic strategies that employ randomness. There are two ways in which Alice and Bob can 
use randomness. The first is that, when they are still together the night before the interrogation, 
they can roll some dice and each write down the results. They can then make their choice of 
answers depend on the outcomes of the dice rolls. For example, they might roll one dice and 
agree that if it comes up odd then Alice and Bob should both answer “no” to all of the questions, 
whereas if it comes up even Bob will switch his answer to question 0 to “yes”. It will not make 
a difference if Alice and Bob look at the dice roll outcomes while they are still together and 
compute their answers, or if they simply write down the outcomes of the dice rolls, take them 
with them to the interrogation room, and perform the computation after the questions are asked. 
So long as they have agreed on a strategy for computing the answers, this will have the same 
result. It is easy to see that this cannot increase their probability of winning the game. On any 
given outcome of the dice rolls, Alice and Bob will end up with some specific set of answers, and 
the bound of 3/4 will apply to these. On average, they will win with probability 3/4 whenever 
the dice rolls lead to an assignment of answers that is an optimal deterministic strategy, and 
with probability less than 3/4 when they do not. Therefore, overall they may as well just pick 
an optimal deterministic strategy to begin with. 

The second way they can use randomness is to take some dice with them into the interrogation 
room, roll them after the questions have been asked, and use the outcome to determine their 
answer based on a pre-agreed strategy. This looks like it adds generality because they could 
choose to roll different dice, with different weightings, depending on which questions they are 
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asked. For example, Alice might take a green die and a blue die into the interrogation room with 
her and roll the green one if she is asked question 0 and the blue one if she is asked question 1. 
In this case, Alice’s answer does not even really come into existence until the question is asked, 
so this perhaps looks a bit more like what is going on in a quantum measurement. 

However, this cannot make a difference to the probability of winning either. Instead of waiting 
until she gets to the interrogation room, Alice could just roll both the green die and the blue die 
while she is still together with Bob. Then, she could just write down both outcomes and use one 
of them if she is asked question 0 and the other if she is asked question 1. This is already covered 
by the first way of using randomness, where Alice and Bob can make their answers an arbitrary 
function of the dice rolls they make when they are together. Although she is doing something 
physically different - rolling two dice in advance vs. rolling just one die when she already knows 
the question ~ as far as the probability of winning the game is concerned we might as well just 
move all the randomness generation to the beginning, when Alice and Bob are still together, 
and we already know this is no better than just choosing a deterministic strategy at the outset. 

Let us now reflect a bit on exactly what we have proved. In general, Alice and Bob can generate 
randomness when they are together and do not know which questions they will be asked yet, 
and also separately after they have been asked their questions. Let’s call the variables that 
they generate when they are together A, taking values in some space A. This will have some 
probability distribution p(A). If Alice and Bob were together and able to communicate when 
they are asked their questions, then the most general thing they could do would be to base their 
answers and By on the questions x and y they are asked, their prior shared randomness A and 
any new randomness they choose to generate in the interrogation room. This can be described 
by conditional probabilities p{Ax, By\x, y, A). 

In general, a conditional probability distribution p(Aa;, y. A) can be decomposed as 

p{Ax, By\x, y, A) = p{By\Ax, x, y, X)p{Ax\x, y, A). (25) 

What happens if we now take into account the fact that Alice and Bob are separated and unable 
to communicate when they are asked their questions? Firstly, Ax cannot depend on y, as Alice 
does not know y when she is asked her question, so we have p{Ax\x, y, A) = p{Ax\x, A). Secondly, 
Bob does not know x or Ax when he is asked his question, so p{By\Ax, x, y, A) = p{By\y, A). Note 
that this does not mean that Bob has no information about how Alice will answer her question. 
In particular, if they have chosen a deterministic strategy then Bob knows exactly how Alice 
will answer each question. However, the point is that A already encodes all of the information 
Bob has about Alice’s strategy so, given A, Bob’s answer has no additional dependence on Ax- 

Altogether then, we have that 

p{Ax, By\x, y, A) = piAx\x, X)p{By\y, A). (26) 

This condition is known as local causality, for reasons we shall see shortly. 

Finally, to work out the probabilities that Alice and Bob will answer [Ax, By) to the pair of 
questions (x,y), we need to average over the randomness A they generated to obtain 

p{Ax,By\x,y) = [ dXp{Ax\x,X)p{By\y,X)p{X). (27) 

JA 

What we have proved via our long discussion is that, in any strategy that satisfies local 
causality, Alice and Bob can win the game with probability no greater than 3/4, or, equiva¬ 
lently, if we define p{Ax = By) = p(“yes”, “yes”|x,y) +p(“no”, “no”|x,y) and p{Ax / By) = 
p(“yes”, “no”|x,y) +p(“no”, “yes”|x,y), we have 


^ [p(Ao = Ho) +p{Ao = Hi) +p{Ai = Ho) +p(Ai / Hi)] < ^ 


(28) 
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which is an example of a Bell inequality. 

Now, instead of classical randomness, let’s see what happens if we allow Alice and Bob to take 
correlated quantum systems with them into the interrogation rooms, and to base their responses 
on the outcomes of quantum measurements. Remarkably, this allows them to violate the bound 
and win the game with probability >3/4. Specifically, suppose they share a pair of qubits in 
the singlet state 


I'k )ab = (|0)a ® |1)b — |1)a ® |0)_b) • (29) 

If Alice is asked question 0 then she measures the Pauli observable <73. (If Alice’s qubit is a 
spin-1/2 particle then the Pauli observables (Ti,cj2 and CJ3 correspond to the angular momenta 
along the x, y and z directions respectively. The Pauli observables are defined in appendix C.) 
If the outcome is -|-1 she answers Aq = “no” and if it is —1 she answers Aq = “yes”. If she is 
asked question 1 then she instead measures the Pauli observable ui and answers Ai = “no” if 
she gets -t-1 and Ai = “yes” if she gets —1. Bob measures (<73 -|- if asked question 0 and 

answers Bq = “yes” if he gets -|-1 and Bq = “no” if he gets —1. If he is asked question 1 he 
instead measures (173 — and answers Bi = “yes” if he gets -|-1 and Bi = “no” if he gets 

— 1 . A straightforward computation reveals that, with this strategy, Alice and Bob will win with 
probability (2 -|- \/2)/4 ss 0.854 [2]. This is strictly larger than the classical bound and results 
from the super-strong correlations that exist within the singlet state. 

This concludes our discussion of the mathematics of Bell’s theorem, but what does it all 
mean physically? Consider the spacetime diagram in Figure 6 . A source emits two qubits in the 
singlet state, which travel to two spacelike separated detectors, A and B, where an observable 
is measured on each of them. Each of the detectors has two settings, corresponding to the two 
questions that Alice and Bob might be asked in the game. We label the outcome at detector 
A when it has setting x as Ax and the outcome at detector B when it has setting y as By. 
The choice of which setting to use at each detector is made at random, sufficiently late such 
that there is no possibility of a signal from the point at which the choice of setting x is made, 
travelling at the speed of light or less, to the detection event By and similarly for y and Ax- If 
we set the detectors to measure the observables described above, then we know we can obtain 
the value (2 -|- \/2)/4 for the left hand side of Eq. (28), in violation of the inequality. 





Figure 6. The Violation of Local Causality: Alice and Bob, who share an entangled quantum state (figure on the left), 
perform local quantum operations in their spacelike separated interrogation rooms. They are asked questions (x, y) and give 
answers (A^;, with causal structure as shown (figure on the right). It is impossible for them to signal faster than light, 
but nonetheless local causality is violated. 


Given the causal structure of Eigure 6, what would we expect to happen in a classical statistical 
model? Consider a region of spacetime that “screens off” x and Ax from y and By and vice versa. 
By this we mean that any timelike path from x or Ax to By that passes through the source must 
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travel through the region. One such region is highlighted in light blue in Figure 6, but many 
others are possible. Let A denote the state of all of the fundamental degrees of freedom that 
exist in this region. 

We would expect that any correlation that exists between the two wings of the experiment 
should be mediated by A. This is because we assumed that A describes all of the fundamental 
degrees of freedom in the region, so there is nothing else within the light blue region that could 
possibly mediate correlations, and any causal link between the two wings that does not pass 
through this region would require super-luminal signalling. In other words, we are in precisely 
the same scenario that Alice and Bob face in their separate interrogation rooms, unable to 
communicate with each other. Thus, for the exact same reasons as before, we expect local 
causality to hold, i.e. p{Ax, By\x,y, X) = p{Ax\x, X)p{By\y, X). As we have already shown, this 
implies that the inequality given in Eq. (28) should hold. 

Remarkably, quantum violations of the inequality have been observed experimentally in the 
experiments of Aspect et. al. [38], and in numerous experiments since then [39-41]. The impli¬ 
cation of this is that local causality must fail, and therefore either one accepts super-luminal 
influences at some fundamental level, or that elementary notions of realism within statistical 
theories must be discarded forever. In the literature, this is often referred to by saying that 
either “locality” or “realism” must be given up. However you wish to parse the dilemma, it is 
clear that Bell inequality violations imply a radical departure from classical physics. 


3.2. Hardy’s theorem: Quantum systems contain an infinite amount of information 

At the heart of classical information theory is the idea of a classical bit - the information 
revealed by a single yes-no question. Our ability to quantify, encode and transform information 
has revolutionised the world in countless ways (telecommunications, the internet, computers, 
etc.), and its study has shed light on the foundations of physics. Central to this is the idea that 
information does not care how we choose to encode it - we can encode information on paper, 
in electronic pulses or carve it into stone. For almost all of history our encoding of information 
has been into classical degrees of freedom. However, Nature is quantum-mechanical and, in 
recent years, we have begun to use quantum degrees of freedom to encode information. A central 
question therefore arises: does information in quantum mechanics have the same properties as 
in classical mechanics? 

Now, the state of even the simplest quantum system - a qubit - is specified by continuous 
parameters. This means that it requires an infinite amount of information to specify the state 
exactly. For example, the amplitude a of jO) in the superposition ajO) -|- /3jl) could encode the 
decimal expansion of tt. Thus, at first glance, it seems that that quantum systems can carry 
vastly more information than classical systems. However, Holevo [22, 42, 43] showed only a 
single bit of classical information can ever be extracted from a qubit system via measurement. 
Further, in spite having a continuous infinity of pure states, quantum computation do not suffer 
from the the problems that rule out analog classical computers [22]. Powerful theorems on the 
discretization of errors [22] tell us that we do not need to correct a continuum of errors, but only 
particular discrete types. These surprising characteristics present a basic conundrum: how is it 
that qubits behave as if they are discrete systems when their state space forms a eontinuum? 

As already discussed, in classical statistical mechanics we can consider the allowed macrostates: 
the set of probability distributions over some state space A of microstates. It is easy to see 
that these distributions also form a continuum - even if there is only a discrete finite set of 
microstates. As an example, consider the case of DNA bases, which can be in one of 4 microstates 
A, T, C or G. The macrostate for a single base is therefore a probability distribution p = 
{pa,Pt,Pc,Pg), obeying YhjPj — ^ 0 < pj < 1 for all j = A,T,C,G. The set of such 

distributions therefore forms a solid tetrahedron (a simplex) in 3-dimensional space, and there 
is a continuum of macrostates (see Figure 7). 

The fact that qubits behave in many ways like discrete, finite systems would be easily explained 
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if perhaps there were only a finite number of more fundamental states - like the hnite number 
of DNA bases - and if the continuum of quantum states only represented our uncertainty about 
which one of them is occupied - like the continuum of DNA macrostates. Surprisingly, in spite 
of Holevo’s bound and the discretization of errors, this cannot be the case: any future physical 
theory that reproduces the physics of finite-dimensional quantum systems must have an infinite 
number of fundamental states. 


T 



State space 

of a single DNA base. 



State space 
of a single qubit. 


Figure 7. The continuity of quantum states is non-classical: In a noisy environment, a single DNA base has state 
space manifold given by a tetrahedron. Genetic information can be made robust against errors, despite having a continuous 
state space — underlying discreteness exists. Quantum information can also be made robust against errors, however unlike 
DNA, there can never be any underlying discreteness. Any statistical theory that reproduces the quantum predictions must 
have an infinity of microstates. Finite dimensional quantum systems necessarily contain an infinite amount of information, 
in contrast to classical systems with a finite state space. 


The proof, due to Hardy [32] (see [33] for an earlier related result), is quite straightforward. 
Firstly, for the sake of contradiction, assume that A is a finite set having N elements, i.e. there 
are N fundamental states in the theory. Let ^(AjT) be the probability distribution corresponding 
to jT) in some hypothetical future theory, and dehne the support of p(AjT) to be 

Avt = {AIp(AIT) > 0}. (30) 

Consider a measurement basis that includes the state jT). In the underlying theory, the jT) 
outcome is represented by a conditional probability p('I'jA). By the discrete version of Eq. (19), 
with integrals replaced by sums, if we prepare the state jT) and measure in this basis, the 
underlying theory must satisfy 


Y,pimpm) = mn" = i- (31) 

A 

Now note that p(AjT), being a probability distribution, must satisfy ~ This 

means that p('kjA) must equal 1 for all A G A^ in order to also make Eq. (31) true. 

Next, consider a two dimensional subspace spanned two orthonormal states jO) and jl), and 
consider the M states 


for j = 0,1, 2,... M — 1, as illustrated in Eigure 7, where M can be chosen as large as we wish. 
For any hnite M, these states satisfy. 


l(Tfc|T,)l2<l 


(33) 
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for any k ^ j. 

We will now show that reproducing the statistics of these states implies that A must contain 
at least N = log 2 M states and hence, since M can be chosen to be arbitrarily large, N must be 
inhnite. 

Consider preparing the system in the state and measuring it in a basis that contains \^k) 
for k ^ j. Then, Eq. (33) implies that 

= |(Tfc|Tj)| < 1. (34) 

A 

This means that there must exist a A G Aij,. such that p(Tfc|A) < 1 since otherwise the sum 
would equal 1. Since p(Tfc|A) = 1 everywhere on A^r^^, this means that A^^ and Aip,^ must be 
different subsets of A. Since this argument applies to every pair k ^ j, we must in total have M 
distinct subsets of A. 

Now, if A has N elements then it has 2^ distinct subsets, so we must have 2^ > M, or A^ > 
log 2 M. However, M can be arbitrarily large, and since log 2 M —)■ oo as M —)■ oo, we conclude 
that if the microstate description reproduces quantum theory then A must have infinitely many 
elements. No finite set of states will ever work, and even the most primitive quantum system 
must contain an infinite amount of information - in stark contrast with classical theory. One can 
further prove that there must in fact be a continuous infinity of microstates. A heuristic way to 
see this is that the experimental probability distributions for quantum states vary smoothly as 
we vary the measurement basis, and so any underlying model must also inherit this smoothness. 
We refer the reader to the literature [44, 45] for a rigorous proof of this. 

It is said that discreteness is a distinctly quantum-mechanical phenomenon, but anyone who 
has ever played a video game will tell you that the concept of a discrete (pseudo-random) classical 
world is really not so strange. Hardy’s theorem together with results such as Holevo’s bound and 
the discreteness of errors, show that precisely the opposite is the case: it is instead the continuity 
of quantum physics that is so strange. How can it be that we have a continuum of quantum 
states that ostensibly behave discretely but we do not have, and cannot have, an underlying 
discrete structure? 


3.3. The Pusey Barrett Rudolph Theorem: Is the wave-function tl/ carved into a quantum 
system? 

In §2.6, we alluded to a particularly subtle question: is the wave-function an objective property 
of a quantum system? Again, our meaning comes through comparison with classical statistical 
mechanics. There the microstates are the objective properties of the system - at any instant 
of time the “real” state of the system is actually a particular microstate, in contrast to the 
system’s macrostate which simply describes the ensemble properties of the system and yields the 
thermodynamic variables of interest. Despite there being no consensus on what, if any, objective 
properties exist in the quantum realm, we can still ask whether it is possible for a system to 
have the same objective properties when it is prepared in one of two different quantum states. 
If the answer is “no”, then we know that the wave-function must be considered an intrinsic, 
or objective property of a system. Note that Hardy’s theorem does not settle the question of 
whether the wave-function is an intrinsic property because, although the space of fundamental 
states must be a continuum, the distributions corresponding to distinct quantum states may still 
overlap, as they do in the Kochen-Specker model discussed in §2.6. 

The broad statistical framework allows us to frame this question in a quantitative way. For 
any hypothetical theory with fundamental states A that reproduces the predictions of quantum 
mechanics for some system, we say that the wave-function is an intrinsic property of the system if 
any two distinct quantum states Ti and '^2 have corresponding probability distributions p(A|'I'i) 
and p(A|T 2 ) that do not overlap. To put it another way, consider the case where A is a finite 
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Figure 8. Is the wave-function ’F carved into a quantum system? 


set^. Then, by p(A|Ti) and p(A|'I' 2 ) overlapping, we mean that there exists a microstate A* G A 
for which both p(A*|Ti) > 0 and p(A*|'h 2 ) > 0. If the system occupies A* then we would not be 
able to tell with certainty which of the two quantum states had been prepared, even if we had 
access to the full microstate of the system. Hence, it would be impossible to ascribe a unique 
wave-function to this microstate - in such a hypothetical scenario the wave-function would not 
be an intrinsic property of the system. 

We now discuss a recent result due to Pusey, Barrett and Rudolph, which shows that, under 
two additional reasonable assumptions, the wave-function must be an intrinsic property of a 
quantum system [34]. 

It suffices to consider a two dimensional quantum system, since, for any higher-dimensional 
system, we can simply restrict to a two-dimensional subspace. We establish a reducio ad absurdum 
by supposing there did exist some future statistical theory in which T is a not an intrinsic 
property and from this arrive at a contradiction. 

Suppose that for two quantum states jTi) and |'I' 2 ), the corresponding distributions p(A|'I'i) 
and p(A|'I' 2 ) overlap. Again, we specialize to a finite space for simplicity, so this means that 
there is an underlying microstate A* which has a probability of occurring if we prepared either 
the quantum state [Ti) or the quantum state |'h 2 )- More precisely, regardless of which of these 
states is prepared, the microstate A* will be occupied a non-zero fraction P* > 0 of the time. 

We now introduce the two assumptions used to prove the theorem. Firstly, imagine preparing 
two copies of the system in one of the four quantum states 

= l^'i) (8) |4'i), |4 'i 2) = |4'i) (8) IT 2 ) 

1 ^ 21 ) = 1 ^ 2 ) (8) 1 ^ 1 ), 1 ^ 22 ) = 1 ^ 2 ) (8) |4'2)- (35) 

We can imagine that the two systems are initially located very far apart, and Alice and Bob 
each choose whether to prepare [Ti) or IT 2 ) independently of each other. 

The first assumption is that, when two systems are prepared independently like this, each 
system gets its own copy of A. The total state space of the two systems is simply the product 
of two copies of A and so microstates are given by (Ai,A 2 ) G A x A. This means that the 
quantum states j'hjfc) are associated with probability distributions of the form p(Ai, A 2 |'I'jfc) and 
measurements on the joint system are associated with conditional probability distributions of 
the form p(<I>|Ai, A 2 ), where |<h) is a vector in the basis we are measuring. 

The second assumption is that the distribution describing the total state p{Xi, X 2 \'^jk) fac¬ 
torizes as p{Xi, A2|4'jA:) = p(Ai|Tj)p(A 2 |'I'A:), where p(Ai|Tj) and p{X 2 \'^k) are the distributions 
that would be associated with jTj) and jT^) for a single system. Taken together, these two 
assumptions are called preparation independence. 


^Measure-theoretic qualifications are needed to deal with the general case. See [30] for details. 
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We now illustrate how preparation independence can be used to prove that the quantum state 
must be an intrinsic property of a quantum system by an example, before outlining the general 
result. Suppose that I'Ti) = |0) and |'I' 2 ) = |+) = :^(|0) + |1))- Under the assumption of 
preparation independence, if we prepare one of the four states I'I'jfc), then, at least a fraction 
P* of the time, the joint system will be in the microstate (A*q,A*^ 2 )) i-e. both systems will 
occupy the microstate A*, for which it is impossible to tell which of or |'I' 2 ) was prepared 
with certainty. This leads to the desired contradiction, once we consider the following two-qubit 
measurement in the basis consisting of the four vectors 

|<hn) = ^(|0)®|l)-Ml)®|0)) 

1 ^ 12 ) = ^(|0)®|-) + |l)®|+)) 

1^21) = ^(I+)®| 1 ) + |-)® | 0 )) 

1 ^ 22 ) = ^(|+)®|-) + |-)® 1+)), (36) 


where h) = ^ (|0) - |1)). 

Why is this particular (entangled) basis significant? The key point is that = 0 

for every choice of j,k = 1,2. Put more simply: the \^jk) measurement outcome never occurs 
when the quantum state \^jk) is prepared. However, we know that, whichever of these states is 
prepared, a fraction of the time the system is in the hypothetical microstate (A*q, A*^ 2 )- If the 
system is in this microstate then which particular outcome occurs when we make a measurement 
in this basis? Suppose we get the outcome when the system occupies (A*q, A*^ 2 )- We know 
that this microstate occurs when likjfc) is prepared a non-zero fraction of the time, so, in order to 
reproduce the quantum predictions, the outcome should never occur for this microstate. 

Thus, a non-zero fraction of the time the measurement device cannot give any outcome that is 
consistent with quantum mechanics, and we have our contradiction. 

The above argument was specific to the states |0) and |-|-), but it can be extended to show that 
every pair of pure states must correspond to non-overlapping distributions. Here, we will just 
outline a version of this argument due to Moseley [46], and refer the reader to [30] for details. 

First, note that if |(4'i|4'2)|^ < 1/2, then the states [Ti) and |'k 2 ) are at least as distinguish¬ 
able as |0) and |-|-), in the sense that it is at least as easy to tell them apart via a quantum 
measurement. It is known that, when this is the case, it is possible to find a physical trans¬ 
formation that maps [Ti) to |0) and [^ 2 ) to |-|-) [47, 48]. If we apply this transformation to 
both systems and then make a measurement in the \^jk) basis then this whole procedure can 
itself be thought of as a measurement on the states \^jk)- This will have the same measurement 
probabilities as before, so the previous argument can be adapted to this case. 

It remains to deal with the case where |('I'i|'k 2 )|^ > 1/2. For this, we note that, if instead 
of preparing one system in the state [Ti) or [4^2 ), we prepare n systems either all in the state 
|Ti) or all in the state ['^ 2 ), then the mod squared inner product of the resulting states will be 
j(4'i|'k2)|^”- For |(Ti|T 2 )|^ < 1, it is possible to choose n such that |('I'i|^ 2 )|^"^ < 1/2 and thus 
we can apply the previous argument to show that the distributions corresponding to [H'l)'^” and 
['^ 2 )'^”' can have no overlap. However, preparation independence implies that these distributions 
are just n-fold products of the distributions corresponding to |Ti) and |'I' 2 ), so we infer that 
these cannot have any overlap either. 

The contradiction we have derived rules out a non-zero P*, which quantifies how much the 
distributions corresponding to any pair of pure states can overlap in any future theory that can 
reproduce the predictions of quantum theory. We conclude that there can never exist a theory 
in which pure states have overlapping distributions, unless preparation independence is violated 
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[49]. It is clear that preparation independence carries great force, and is an extremely natural 
assumption to make - imagine preparing a Hydrogen atom close to its ground-state in one 
part of the universe, and another person preparing another such atom on the other side of the 
galaxy; no correlations ought to be created between these two preparations, which should be fully 
independent. We have just shown that this deceptively simple principle has a deep implication: 
whatever future theory may arise, the quantum wave-function 'b will always be carved into a 
quantum system as an objective label of reality. 


4. Conclusion 

To sum up, we have shown that many phenomena that are traditionally viewed as intrinsi¬ 
cally quantum-mechanical; such as randomness, discreteness, the indistinguishability of states, 
measurement-uncertainty, measurement-disturbance, complementarity, non-commutativity, in¬ 
terference, the no-cloning theorem, and the collapse of the wave-packet; all appear within classi¬ 
cal statistical mechanics under reversible dynamics. These serve to map out classical fragments 
of quantum physics, in a search for the genuinely strange aspects of the theory. In addition to 
Bell’s theorem on the failure of local causality at a fundamental level, we have described two 
less well-known results that reveal further deep and subtle insights into the quantum realm. 
Quantum systems unavoidably contain a continuous infinity of information, despite their ap¬ 
parently discrete behaviour, while the ability to prepare physical systems independently of one 
another implies that the quantum wave-function is carved onto a quantum system as an objective 
physical property of its microstate. 

In this article, we have only presented a small sample of recent results, in what is a flourishing 
area of current research. For example, we have not discussed the recently developed operational 
approach to contextuality [50], which is another genuinely nonclassical quantum phenomenon, 
or the powerful graph theoretic approach to contextuality [51, 52]. We have neglected many 
beautiful and deep results, such as those of Colbeck and Renner [53, 54] who rule out theories 
beyond quantum mechanics using weaker assumptions than Bell; the results of Montina [55-59] 
that reveal links between the structure of classical fragments to the far more grounded topic of 
the communication complexity of quantum channels; and, following on from the Pusey-Barrett- 
Rudolph theorem, many other results and experiments on the reality of the wave-function [30, 60- 
73]. There are also ambitious programs that seek to reformulate quantum theory as a theory 
of Bayesian inference [74], to derive quantum theory from physically reasonable axioms [75- 
79], and to formulate quantum theory in the absence of fixed causal structure [80-84]. Finally, 
we have not discussed prominent areas of research such as quantum computing [22], quantum 
cryptography [85] and quantum metrology [86], which are the practical fruit of foundational 
investigations, and have their own insights to offer about the difference between classical and 
quantum physics. For these and more, we refer the interested reader to the bibliography, where 
they will find an array of diverse and vibrant research programs that continue to delve into the 
very foundations of quantum physics. 
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Appendix A. The Resolution Restriction on classical statistical mechanics 


Any statistical Liouville distribution is a function / on the system’s phase space (given by 
for a system of N canonical coordinates), with f{x,p) > 0 and such that Jdxdpf{x,p) = 1 . 
For simplicity we restrict attention to a single particle in one dimension, but the same argument 
holds for more general systems. The expectation value of the particle’s location is given by 
(x) = Jdxdp[f{x,p)x], while the expectation value of its momentum is (p) := Jdxdp[f{x,p)p\. 

The RR-constraint in phase space is imposed on the fluctuations about the mean ((x), (p)) for 
the Liouville distributions. To this end, we define zi = x and Z 2 = p, and a symmetric covariance 
matrix 7 *^ := {ziZj) — {zi){zj) that measures the fluctuations of the Liouville distribution. In 
terms of position and momentum it is given by 


7 = 


(Ax)^ (xp) — {x){p) 
(xp) — {x){p) (Ap)^ 


(Al) 


where (Ax)^ = (x^) — (x)^ and (Ap)^ = (p^) — (p)^ are the variances of the position and 
momentum for the Liouville distribution. 

The RR-condition states that we only allow Liouville distributions /(x,p) that have some 
minimal level of fluctuations, as measured by 7 . We could impose this as a constraint on the 
eigenvalues of 7 , but a more elegant way is to demand that the matrix 7 obey the matrix equation 


7 + aD > 0, 


(A2) 


for some 2x2 matrix D, and some complex number a that measures the scale of the “boxes” 
on phase space, and where, for a matrix M, M > 0 means that M is a semipositive matrix, 
i.e. all the eigenvalues of M are > 0. Since 7 is always semipositive, the case a = 0 corresponds 
to switching off the constraint. 

Now, classical mechanics carries a symplectic structure [13]. The dynamics generates a Liou¬ 
ville flow, which preserves this symplectic structure. However from the perspective of statistical 
mechanics, macroscopically reversible transformations correspond to the subset of linear symplec¬ 
tic transformations where the canonical coordinates transform as 2 ; —)■ A^z for some symplectic 
matrix A{t), obeying A^T,A = S. The case of more general symplectic transformations is found 
to correspond to entropy production and irreversibility under the statistical restriction (see [ 12 ] 
and [13] for more details). The matrix S originates from the classical Poisson bracket and is 
given by 


S = 


0 -1 
1 0 


(A3) 


This is simply the symplectic matrix defining the group action of reversible classical dynamics. 
Now, for the above RR-condition to be meaningful it must be maintained under arbitrary dy¬ 
namics. For the linear symplectic case the matrix 7 transforms as 7 —)■ A{t)^jA{t), this implies 
that A{t)^DA{t) should be equal to D in order that the RR-condition has an invariant form 
under dynamics. It is thus sufficient to choose D = T, within the constraint. Moreover, S is 
a skew-symmetric matrix with pure imaginary eigenvalues and so, since 7 is a symmetric real 
matrix, we must have that the constant a is pure imaginary in order to obtain a meaningful 
constraint on the expectation values (note the slightly different notation used here compared to 
the main text, in which iX = a and C = iD). We have argued for this identification under linear 
symplectic transformations, but it holds more generally. The RR-condition therefore becomes 


7 -|- zAS > 0 , 


(A4) 


for some fixed minimal scale A > 0 on the classical phase space. 


August 4, 2015 0:35 Contemporary Physics contemp-phys-review 


27 

Finally we follow a statistical mechanics account of the physics [15, 16] so that, for a given 
covariance matrix 7 , we use the Gibbsian distribution / that maximizes the thermodynamic 
entropy S = — fdxdp f{x,p) log f{x,p) and has covariance matrix 7 . Thus the scenario we have 
described is precisely one of classical statistical mechanics where our classical resolving power is 
bounded in phase space by a scale A. Indeed, for zero cross-correlations we find that AxAp > A, 
and so the RR-condition encodes a classical uncertainty relation on the statistical system. 


Appendix B. General structnre of quantnm theory 

In quantum theory, the state space of a physical system is a complex Hilbert space 71. A quantum 
state |T) is a unit vector in Ti. An observable is represented by a selfadjoint operator A, Al = A. 
Any such operator has a set of real eigenvalues aj, which represent the possible outcomes of a 
measurement of the observable. By the spectral theorem, A can be written as A = '^jCLjPj, 
where Pj is the projector onto the eigenspace corresponding to aj. Note that, for an operator 
with continuous spectrum, the sum would be replaced by an integral. If each eigenspace is one¬ 
dimensional, then A is called non-degenerate and the projectors can be written as Pj = 
where |<hj) is the eigenvector corresponding to the eigenvalue aj. We shall only consider non¬ 
degenerate observables in what follows. 

The eigenstates |<hj) of a nondegenerate observable always form a complete orthonormal basis 
for the Hilbert space 7-1, so a quantum state |T) can be decomposed in this basis as 

|T) = ^a,-|cI>,), (BI) 

j 

where aj = (<I>j|T) and again the sum would be replaced by an integral for an observable with 
a continuous spectrum. 

If the system is prepared in the state jT) and the observable A is measured, then the outcome 
ttj is obtained with probability |(<hj|T)p = \aj\‘^. Assuming that the measurement is performed 
in a non-destructive manner, after the measurement the state is updated to |‘hj). The transition 
from jT) to |<hj) upon measurement is the notorious “collapse of the wave-packet”. 

Since the measurement probabilities and the state-update rule only depend on the eigenbasis 
{|<I>j)} of the observable and not on the eigenvalues, they would remain the same if we measured 
a different observable with the same eigenbasis. For this reason, we shall often speak of measuring 
the basis {|<I>j)}, which simply means measuring any non-degenerate observable that has this as 
its eigenbasis. 

The dynamics of a closed quantum system is described by the Schrodinger equation 

= (B2) 

where H is the Hamiltonian, which is a self-adjoint operator. If the Hamiltonian is constant, then 
this leads to the formal solution \d>{t)) = U{t — to)l'k(to)), where U{t) = It is easy to 

check that U{t) is unitary, which means that U\t)U{t) = U{t)U\t) = 1, where 1 is the identity 
operator on 71. Indeed, in principle any unitary operator can be obtained by an appropriate 
choice of H. If the Hamiltonain varies in time, then the dynamics is still given by a unitary 
operator U{t), which is now given in terms of time-ordered exponentials of the Hamiltonian, 
but the key point is that it is still unitary. Because of these facts, we can simply say that the 
discrete-time dynamics of a closed quantum system is described by a unitary operator U. 

Finally, if a quantum system consists of two subsystems with Hilbert spaces Tii and 7 -L 2 , then 
the composite system has a Hilbert space 7ii<Si ^ 2 - This means that, if {I'kj)} is a basis for Tii 
and is a basis for 7 -I 2 , then 7ii®7i2 is the Hilbert space spanned by the basis {|'I'j)( 8 )|<hfc)}. 
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Figure Cl. Qubit Systems: The Bloch Sphere (left) is the state space of the most elementary quantum system - the 
qubit. Every point in the sphere corresponds to a unique quantum state. The Mach-Zehnder interferometer (right) illustrates 
the process of interference in quantum mechanics. An in-coming photon strikes a 50/50 beam-splitter I and splits into two 
different paths that later recombine (after mirror reflections) at a second beam splitter II. 


Appendix C. Qubits 

A qubit is any quantum system with a two dimensional Hilbert space. For example, it may be 
the spin of a spin-1/2 particle, the polarization of a photon, or the energy levels of a two-level 
atom. We use the notation |0) = (1,0)^, |1) = (0,1)^ to represent the two standard basis states, 
which are the eigenstates of the Pauli operator 


<73 



(Cl) 


For a spin-1/2 particle, these would be the spin-up and spin-down states in the z direction, and 
the notation |z f) = |0), \z \) = |1) is often used in this case, but bear in mind that the physical 
interpretation of these states depends on which instantiation of a qubit we are considering. 

The two other Pauli operators are 


CJl = 




(C2) 


which, for a spin-1/2 particle, represent spin in the x and y directions respectively. The eigen¬ 
states of ai are |±) = (|0) ± |1)) and the eigenstates of a 2 are | ± i) = (|0) ± *|1)). In 

the spin- 1/2 instantiation, these are often alternatively written as |x t) = 1 +)) i) = !“)> 

|y t) = I +0, and |yb = I -*)■ 

The Pauli operators are paradigmatic examples of noncommuting operators. Their commuta¬ 
tors are given by [<ti,(T 2 ] = ia^, [cr 2 ,cTi] = —icr^ and cyclic permutations. However, noncommu¬ 
tativity is just a mathematical statement, so the following facts are often pointed to as physical 
correlates of noncommutativity. 

• The Pauli observables are complementary to one another: If the system is in a state where 
(J 3 has a definite value, i.e. either | 0 ) or | 1 ), then measurements in the bases {|+), |—)} and 
{| + i), I — *)} are completely uncertain in the sense that each outcome has probability 1 / 2 , 
and the same property holds under permutation of the Pauli operators. 

• If several Pauli measurements are performed in a row then the outcome depends on the order 
in which the observables are measured. For example, if the system is prepared in the state 
|-|-) and immediately measured in the basis {|+), |—)} then the |-|-) outcome will be obtained 
with certainty. However, if a measurement in the {|0), |1)} basis is performed first then each 












August 4, 2015 0:35 Contemporary Physics contemp-phys-review 


29 

outcome of a subsequent measurement in the {|+),|—)} basis occurs with probability 1/2. 
To see this, suppose that the |0) outcome is obtained in the first measurement. Then, the 
state gets updated to |0), which now has a probability 1/2 of giving either outcome in a 
measurement in the {|+), |—)} basis. The same is true if the first measurement yields the |1) 
outcome. This phenomenon is often referred to by the phrase “measurements disturb the state 
of the system”, but it is better to simply say that the order of measurements can affect the 
outcome statistics. 

Another paradigmatic quantum phenomenon is interference, and this can also be exhibited 
with the Pauli eigenstates. Consider the Mach-Zehnder interferometer with 50/50 beam-splitters 
illustrated in Figure Cl and suppose that a single photon is passed through it. The two input 
ports can be represented on a two-dimensional Hilbert space, where |0) represents a photon 
incident on the beam-splitter from the left side and |1) from the right. After the beam-splitter, 
we use the same labels for the transmitted beams and the opposite for reflected beams, so |0) 
is used to represent a photon travelling along the right arm of the interferometer and |1) for 
the left, and after the second beamsplitter, |0) represents a photon in the left output beam and 
|1) the right. With these conventions, the transformation implemented by a beam-splitter is 
represented by the unitary matrix 


1 



(C3) 


which maps |0) to |-|-) = :^(|0) -|- |1)) and |1) to |—) = :^(|0) — |1)), and the detectors perform 
a measurement in the {|0), |1)} basis, where Dj firing corresponds to the \j) outcome. 

First consider what happens when we remove the second beam-splitter and input the photon 
from the left of the first beam-splitter. Then, |0) gets mapped to |-|-) and there is a probability 
1/2 that each of the detectors will fire. These statistics are consistent with the idea that, at each 
beam-splitter, the photon is either definitely transmitted or definitely reflected, with probability 
1/2 each. However, if we replace the second beam-splitter, then |-|-) gets mapped to |0) before 
the detection and so Dq will fire with certainty. In other words, there is constructive interference 
between the two beams at the left output port of the interferometer, and destructive interference 
on the right. This is not consistent with the idea that, at a beamsplitter, the photon always goes 
one way or the other with probability 1/2, since otherwise we would expect both detectors to 
fire with probability 1/2 in this case as well. 

Finally, we also need to consider how a general qubit state is represented. In the {|0), |1)} 
basis, an arbitrary state of a qubit can be written as 


1^) =cos 


|0) -|- sin 



| 1 ), 


(C4) 


where 0 < i9 < tt and —tt < (p < tt. Alternatively, because the Pauli operators span 
the vector space of 2 x 2 matrices, we can represent the state by its Bloch vector 'I' = 
((TlfJilT), (T|cj 2 |'I'), ('f'lo'sl'h)) = (sintlcos^?,sini9sin(^,cosi?). This shows that the states of 
a qubit are isomorphic to points on the surface of the unit sphere in real three-dimensional 
space, which is known in this context as the Bloch sphere (see Figure Cl). 

Note that, due to the doubling of the angles in the Bloch sphere representation, states that 
are orthogonal in the Hilbert space are represented by pairs of antipodal points on the sphere. 
This means that an orthonormal basis {!$), Id*"*")}, representing a measurement, is represented 
by a pair {$,$-*-} of antipodal points on the sphere. The probabilities that quantum theory 
predicts for measurement outcomes can also be rewritten in terms of Bloch vectors via |(4>|T)p = 

4(1 + $ 
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Appendix D. Continuous variable systems 

In nonrelativistic quantum mechanics, the Hilbert space of a spinless particle in one dimension 
spanned by the continuum of eigenstates \x) of the position operator x, x|x) = x|x). A state |'I') 
is therefore written as 


/ OO 

dxil){x)\x), (Dl) 

-OO 

where ^(x) = (x|^) is the wave-function in position representation and |V^(x)pdx is the proba¬ 
bility that the outcome of a position measurement will be between x and x -|- dx. 

The momentum operator p is represented as p = and, using this, we can alternatively 

represent jT) in the basis of momentum eigenstates as 

/ OO 

dp(j){p)\p), (D2) 

-OO 

where \4>{p)\‘^dp is the probability that the outcome of a momentum measurement will be between 
p and p -|- dp. The functions 'ip{x) and (p{p) are related by the Fourier transform. 

Position and momentum are complementary variables, in the sense that if one of them is cer¬ 
tain then the other is completely indeterminate. Further, they obey the Heisenberg uncertainty 
relation 


AxAp > -, 


(D3) 


where Ax = y/(x^) — (x)^ and Ap = are the standard deviations of position and 

momentum computed for any fixed state |T). 

Although it may be less familiar, it is possible to represent a quantum state as a function 
W{x,p) on phase space, known as the Wigner function, defined by 


W{x,p) = 


2TTh 


dy'ip* {x + - 




lyy 

— X I e ft . 


(D4) 


The Wigner function is like a probability density on phase space, except that it can take neg¬ 
ative values. Nevertheless, its marginals give the correct probability densities over position and 
momentum, i.e. 

/ OO yoo 

dpW{x,p) = \'il){x)\^ W{p)= / dxlT(x,p) = |(^(p)p. (D5) 

-OO J —OO 

A Gaussian quantum state is a state for which the Wigner function is a Gaussian. Gaussian 
states are an important subclass of quantum states that occur in a variety of applications, e.g. 
the coherent states produced by a laser are Gaussian. In order to write down a Gaussian state 
it is helpful to form the phase space vector z = (x,p)^ and define the 2 x 2 covariance matrix 7 
with elements 7 ^^ = {zjZk) — {zj){zk), i.e. 

( (Ax)2 {xp)-{x){p)\ 

^ \{xp) — {x){p) (Ap)^ j ’ 



where Ax and Ap are the standard deviations defined above. 
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Then, a Gaussian state has a Wigner function of the form 

W{x,p) oc 


31 


(D7) 


where (z) = ((x), {pN■ 

In general, a continuous variable system with N degrees of freedom is represented by a Hilbert 
space spanned by vectors \x), where x G is a configuration space point. A state |T) is of the 
form 

IT) = J dx^{x)\x), (D8) 

where the wave-function ip{x) = (®|T) is now a function on configuration space and |'0(a;)p is 
the probability density of finding the system in a given configuration. 

We can again define a Wigner function as 

where the integration is now over configuration space and d is the number of spatial dimensions. 
Defining the vector z = {xi,pi,X2,P2, ■ ■ ■ ,X]y,PN) and covariance matrix jjk = {zjZk) — {zj){zk), 
a Gaussian state is again one that has a Wigner function of the form 

W{x,p) oc (DIO) 

Finally, there is also a notion of Gaussian measurements and Gaussian dynamics. A Gaussian 
measurement is one that, when performed on a Gaussian state, the state remains Gaussian after 
the measurement, and a Gaussian unitary operator is a unitary that maps Gaussian states to 
Gaussian states. Importantly, this is also required to hold when the operation is only applied 
to a subsystem. For example, when applied to a two-particle Gaussian state, performing a one- 
particle Gaussian transformation to one of the particles and doing nothing to the other should 
leave the system in a two-particle Gaussian state. By Gaussian Quantum Mechanics, we mean 
the sub-theory of continuous variable quantum mechanics in which states, measurements, and 
dynamics are all restricted to be Gaussian. 
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